FOODS4ALL HOME
Last Updated: 2020-04-25

流行疫学(Epidemiology)の R Package の、使い方を学ぶつもりで、検索をすると、あまりに、たくさんのものがあることが判明した。ここに挙げるのはほんの一部である。過去の、データを集めたものもあるが、現時点での、Covid-19 の状況にあわせて、利用してみることにする。
これは、学習の記録で、疫学について、または、Covid-19 について、何らかの情報を提供することを目的としていない。データの取得などの、実例と、Package について、調べた備忘録的なものである。

1. R Epidemic Consortium

R の Software などのリンクがある。このサイトがベストかどうかは不明である。

  • Description: The R Epidemics Consortium (RECON) is an international not-for-profit, non-governmental organisation gathering experts in data science, modelling methodology, public health, and software development to create the next generation of analytics tools for informing the response to disease outbreaks, health emergencies and humanitarian crises, using the R software and other free, open-source resources.
  • web: RECON

2. R Packages

流行疫学(Epidemiology)に関する、R の Package をいくつかリストする。これらは、RECON にリストされているものもあるが、されていないものもある。

2.1 epimdr

説明にあるように、本や、coursera(MOOCs)のコースに関連した、例などを含んでいる。現時点では、コースは受講していない。

2.2 earlyR

3.1 Tim Churches のブログで紹介されており、Basic Reproduction Number \(R_0\) を推定できそうなので、使ってみた。(earlyR 応用例)しかし、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。

  • Description: Implements a simple, likelihood-based estimation of the reproduction number (R0) using a branching process with a Poisson likelihood. This model requires knowledge of the serial interval distribution, and dates of symptom onsets. Infectiousness is determined by weighting R0 by the probability mass function of the serial interval on the corresponding day. It is a simplified version of the model introduced by Cori et al. (2013) doi:10.1093/aje/kwt133.
  • web: earlyR: Estimation of Transmissibility in the Early Stages of a Disease Outbreak
  • Manual: earlyR
  • Homepage: Welcome to the earlyR package!
  • Function: get_R: Estimate the Reproduction Number
    • This function estimates the (most of the time, ’basic’) reproduction number (R) using i) the known distribution of the Serial Interval (delay between primary to secondary onset) and ii) incidence data.
    • The estimation of R relies on all available incidence data. As such, all zero incidence after the first case should be included in x. When using inidence from the ’incidence’ package, make sure you use the argument last_date to indicate where the epicurve stops, otherwise the curve is stopped after the last case. Use as.data.frame to double-check that the epicurve includes the last ’zeros’.

2.3 EpiEstim

3.1 Tim Churches のブログで紹介されており、Basic Reproduction Number \(R_0\) を推定するなどできそうなので、使ってみたが、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。

2.4 EpiILM

  • Description: Provides tools for simulating from discrete-time individual level models for infectious disease data analysis. This epidemic model class contains spatial and contact-network based models with two disease types: Susceptible-Infectious (SI) and Susceptible- Infectious-Removed (SIR).
  • web: EpiILM: Spatial and Network Based Individual Level Models for Epidemics
  • Manual: EpiILM

2.5 EpiModel

  • Description: EpiModel is an R package that provides tools for simulating and analyzing mathematical models of infectious disease dynamics. Supported epidemic model classes include deterministic compartmental models, stochastic individual contact models, and stochastic network models. Disease types include SI, SIR, and SIS epidemics with and without demography, with utilities available for expansion to construct and simulate epidemic models of arbitrary complexity. The network model class is based on the statistical framework of temporal exponential random graph models (ERGMs) implementated in the Statnet suite of software for R.
  • web: EpiModel: Mathematical Modeling of Infectious Disease Dynamics
  • Manual: EpiModel
  • vignette: EpiModel: Mathematical Modeling of Infectious Disease Dynamics
  • EpiModel Home: EpiModel

2.6 EpiCurv

2.7 epiR

  • Description: Tools for the analysis of epidemiological data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, and computing confidence intervals around incidence risk and incidence rate estimates. Miscellaneous functions for use in meta-analysis, diagnostic test interpretation, and sample size calculations.
  • web: epiR: Tools for the Analysis of Epidemiological Data
  • Manual: epiR
  • vignette: Descriptive Epidemiology using epiR

2.7 Incidence

3. Website

Covid-19 のこともあり、サイトは限りなくある。例として、R Package を利用するために、参考にしたサイトをリストする。

3.1 COVID-19 epidemiology with R

Code も含まれていたので、実験をしてみた。(earlyR 応用例)時間をとって、いずれまた理解を深めたい。

3.2 Learning Machines

  • web: Learning Machines, A blog about data, science, and learning machines – like us

4. Examples of Analysis

library(tidyverse)
library(rvest)

4.1 Collection of Data

4.1.1 Incidence data collated by John Hopkins University

おそらく、Covid-19 関連では、世界で、最も利用されているデータ。

Data Source

ブログ 3.1 COVID-19 epidemiology with R の Code を確認する。

時間も経過しており、コードを変更しないといけない部分もあった。

We follow the following page. However, the url of the website is changed and hence revised.

Several countries, i.e., Australia, Canada, China, Denmark, France, Netherlands, United Kingtdom, are divided into regions.

library(tidyverse)
library(lubridate)
library(gridExtra)
# library(kableExtra)
#######
jhu_url <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
covid19_global <- read_csv(jhu_url)
covid19_six <- covid19_global %>% 
  rename(Province = "Province/State", Country_Region = "Country/Region") %>% 
  pivot_longer(-c(Province, Country_Region, Lat, Long), 
               names_to = "Date", values_to = "Cumulative_Cases") %>%
  mutate(Reported_Date = as.Date(Date, "%m/%d/%y")) %>% 
  mutate(Daily_Confirmed_Cases = c(0, diff(Cumulative_Cases))) %>% 
  select(Province, Country_Region, Reported_Date, Cumulative_Cases, Daily_Confirmed_Cases) %>%
  filter(Country_Region %in% c("Japan", "Korea, South", "Germany", "Italy", "Spain", "US")) 
japan <- covid19_six %>% filter(Country_Region == "Japan")
jd <- japan %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Japan: Daily Confirmed Cases")
jc <- japan %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Japan: Cumulative Cases")
grid.arrange(jd, jc, ncol=2)

korea <- covid19_six %>% filter(Country_Region == "Korea, South")
kd <- korea %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Korea: Daily Confirmed Cases")
kc <- korea %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Korea: Cumulative Cases")
grid.arrange(kd, kc, ncol=2)

germany <- covid19_six %>% filter(Country_Region == "Germany")
gd <- germany %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Germany: Daily Confirmed Cases")
gc <- germany %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Germany: Cumulative Cases")
grid.arrange(gd, gc, ncol=2)

italy <- covid19_six %>% filter(Country_Region == "Italy")
id <- italy %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col() + 
  ggtitle("Italy: Daily Confirmed Cases")
ic <- italy %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Italy: Cumulative Cases")
grid.arrange(id, ic, ncol=2)

spain <- covid19_six %>% filter(Country_Region == "Spain")
sd <- spain %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col()+ 
  ggtitle("Spain: Daily Confirmed Cases")
sc <- spain %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("Spain: Cumulative Cases")
grid.arrange(sd, sc, ncol=2)

us <- covid19_six %>% filter(Country_Region == "US")
usd <- us %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Daily_Confirmed_Cases)) + geom_col()+ 
  ggtitle("US: Daily Confirmed Cases")
usc <- us %>% filter(Reported_Date > "2020-02-01") %>% 
  ggplot(aes(Reported_Date, Cumulative_Cases)) + geom_col() +
  ggtitle("US: Cumulative Cases")
grid.arrange(usd, usc, ncol=2)

4.1.2 Wikipedia

個別の発症者のリスト、Line List が含まれている。

ブログ 3.1 COVID-19 epidemiology with R の Code を確認する。

web scraping をして、それから、wrangling 方法が書かれている。時間も経過しており、最初から、コードも書き直しが必要。また、Data Wrangling(整理)の部分の Code は、すべては、書かれていない。

In the blog above, R script reading the timeline in the wikipedia using rvest package is explained but the code is not perfect and could not use the code to clean the data.

4.1.3 Our World in Data

非常に興味深い様々なデータとそのグラフが掲載されている。データも公開されている。

4.1.4 Japanese Cases

library(tidyverse) 
library(rvest)
library(stringr)
library(lubridate)
library(scales)
url <- "https://www.mhlw.go.jp/stf/newpage_10022.html"
h <- read_html(url)
tab <- h %>% html_nodes("table")
tab_0 <- tab[[3]] %>% html_table 
dat_0 <- tab_0
colnames_0 <- dat_0[1,]
colnames(dat_0) <- colnames_0
dat_0 <- dat_0[2:nrow(dat_0),]
dat_0 <- dat_0[,3:6]
tab_1 <- tab[[4]] %>% html_table 
dat_1 <- tab_1
colnames_1 <- dat_1[1,]
colnames(dat_1) <- colnames_1
dat_1 <- dat_1[2:nrow(dat_1),]
dat_1 <- dat_1[,4:7]
dat <- bind_rows(dat_0, dat_1) %>% arrange(確定日)
dat$居住地 <- dat$居住地 %>% str_replace(pattern = "\r\n\t\t\t\t", replacement = "")
rownames(dat) <- 1:nrow(dat)
dat$確定日 <- as.Date(dat$確定日, "%m/%d")
# dat[dat$年代 %in% c("10代未満", "305"),]
dat$性別[dat$年代 == "10代未満"] <- "男"
dat$年代[dat$年代 == "10代未満"] <- "10歳未満"
dat$年代[dat$年代 == "305"] <- "30代"
data_jp <- dat %>% filter(居住地 != "中国(武漢市)") %>% 
  filter(居住地 != "中国(湖南省)") %>% filter(居住地 != "中国")
data_jp

4.1.5 ジャッグジャパン Jag-Japan

2020年2月17日に公開されたようで、日本でのダッシュボードによる情報提供の最初であると思われる。個人的には、3月に入ってから、この存在を知る。ひとり一人の Line List を含んでおり、日本のもので、まとまったものとしては、一番、情報量が多い。しかし、本来は、まずは、厚生労働省のサイトで、このようなデータを、公開すべきなのだろう。

jag_url <- "https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv"
jag <- read_csv(jag_url)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   通し = col_double(),
##   市町村内症例番号 = col_double(),
##   人数 = col_double(),
##   累計 = col_double(),
##   前日比 = col_double(),
##   発症数 = col_double(),
##   死者合計 = col_double(),
##   退院数累計 = col_double(),
##   退院数 = col_double(),
##   PCR検査実施人数 = col_double(),
##   PCR検査前日比 = col_double(),
##   X = col_double(),
##   Y = col_double(),
##   Field4 = col_logical(),
##   Field5 = col_logical(),
##   Field6 = col_logical(),
##   Field7 = col_logical(),
##   Field8 = col_logical(),
##   Field9 = col_logical(),
##   Field10 = col_logical()
## )
## See spec(...) for full column specifications.
## Warning: 6 parsing failures.
##  row              col expected        actual                                                               file
## 4464 市町村内症例番号 a double 神戸59/西宮36 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4465 市町村内症例番号 a double 神戸60/西宮37 'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4830 市町村内症例番号 a double 千葉無症状70  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 4831 市町村内症例番号 a double 千葉無症状71  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## 5921 市町村内症例番号 a double 千葉無症状72  'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'
## .... ................ ........ ............. ..................................................................
## See problems(...) for more details.
str(jag)
## tibble [12,705 × 51] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ 通し              : num [1:12705] 1 2 3 4 5 6 7 8 9 10 ...
##  $ 厚労省NO          : chr [1:12705] "1" "2" "3" "4" ...
##  $ 無症状病原体保有者: chr [1:12705] NA NA NA NA ...
##  $ 国内              : chr [1:12705] "A-1" "A-2" "A-3" "A-4" ...
##  $ チャーター便      : chr [1:12705] NA NA NA NA ...
##  $ 年代              : chr [1:12705] "30" "40" "30" "40" ...
##  $ 性別              : chr [1:12705] "男性" "男性" "女性" "男性" ...
##  $ 確定日            : chr [1:12705] "1/15/2020" "1/24/2020" "1/25/2020" "1/26/2020" ...
##  $ 発症日            : chr [1:12705] "1/3/2020" "1/14/2020" "1/21/2020" "1/23/2020" ...
##  $ 受診都道府県      : chr [1:12705] "神奈川県" "東京都" "東京都" "愛知県" ...
##  $ 居住都道府県      : chr [1:12705] "神奈川県" "中華人民共和国" "中華人民共和国" "中華人民共和国" ...
##  $ 居住管内          : chr [1:12705] NA NA NA NA ...
##  $ 居住市区町村      : chr [1:12705] NA NA NA NA ...
##  $ キー              : chr [1:12705] "神奈川県" "中華人民共和国" "中華人民共和国" "中華人民共和国" ...
##  $ 発表              : chr [1:12705] "神奈川県" "東京都" "東京都" "愛知県" ...
##  $ 都道府県内症例番号: chr [1:12705] "1" "1" "2" "1" ...
##  $ 市町村内症例番号  : num [1:12705] NA NA NA NA NA NA NA NA NA NA ...
##  $ ステータス        : chr [1:12705] "退院" "退院" "退院" NA ...
##  $ 備考              : chr [1:12705] NA NA NA "国籍:中国" ...
##  $ ソース            : chr [1:12705] "https://www.mhlw.go.jp/stf/newpage_08906.html" "https://www.mhlw.go.jp/stf/newpage_09079.html" "https://www.mhlw.go.jp/stf/newpage_09099.html" "https://www.mhlw.go.jp/stf/newpage_09100.html" ...
##  $ ソース2           : chr [1:12705] "https://www.pref.kanagawa.jp/docs/ga4/bukanshi/occurrence.html" "https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/24/20.html" "https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/27/24.html" "https://www.pref.aichi.jp/uploaded/attachment/321138.pdf" ...
##  $ ソース3           : chr [1:12705] NA NA NA NA ...
##  $ 人数              : num [1:12705] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 累計              : num [1:12705] 1 2 3 4 NA NA 7 8 NA NA ...
##  $ 前日比            : num [1:12705] 1 1 1 1 NA NA 3 1 NA NA ...
##  $ 発症数            : num [1:12705] 0 2 2 3 NA NA 1 2 NA NA ...
##  $ 死者合計          : num [1:12705] NA NA NA NA NA NA NA NA NA NA ...
##  $ 退院数累計        : num [1:12705] 1 NA NA NA NA NA NA NA NA NA ...
##  $ 退院数            : num [1:12705] 1 NA NA NA NA NA NA NA NA NA ...
##  $ PCR検査実施人数   : num [1:12705] NA NA NA NA NA NA NA NA NA NA ...
##  $ PCR検査前日比     : num [1:12705] NA NA NA NA NA NA NA NA NA NA ...
##  $ 職業_正誤確認用   : chr [1:12705] NA NA NA NA ...
##  $ 勤務先_正誤確認用 : chr [1:12705] NA NA NA NA ...
##  $ Hospital Pref     : chr [1:12705] "Kanagawa" "Tokyo" "Tokyo" "Aichi" ...
##  $ Residential Pref  : chr [1:12705] "Kanagawa" "China(Mainland)" "China(Mainland)" "China(Mainland)" ...
##  $ Release           : chr [1:12705] "Kanagawa Prefecture" "Tokyo Metropolitan Government" "Tokyo Metropolitan Government" "Aichi Prefecture" ...
##  $ Gender            : chr [1:12705] "Male" "Male" "Female" "Male" ...
##  $ X                 : num [1:12705] 140 116 116 116 116 ...
##  $ Y                 : num [1:12705] 35.4 39.9 39.9 39.9 39.9 ...
##  $ 確定日YYYYMMDD    : chr [1:12705] "2020/1/15" "2020/1/24" "2020/1/25" "2020/1/26" ...
##  $ 受診都道府県コード: chr [1:12705] "14" "13" "13" "23" ...
##  $ 居住都道府県コード: chr [1:12705] "14" NA NA NA ...
##  $ 更新日時          : chr [1:12705] "4/25/2020 12:57" NA NA NA ...
##  $ Field2            : chr [1:12705] NA NA NA NA ...
##  $ Field4            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field5            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field6            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field7            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field8            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field9            : logi [1:12705] NA NA NA NA NA NA ...
##  $ Field10           : logi [1:12705] NA NA NA NA NA NA ...
##  - attr(*, "problems")= tibble [6 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:6] 4464 4465 4830 4831 5921 6003
##   ..$ col     : chr [1:6] "市町村内症例番号" "市町村内症例番号" "市町村内症例番号" "市町村内症例番号" ...
##   ..$ expected: chr [1:6] "a double" "a double" "a double" "a double" ...
##   ..$ actual  : chr [1:6] "神戸59/西宮36" "神戸60/西宮37" "千葉無症状70" "千葉無症状71" ...
##   ..$ file    : chr [1:6] "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" "'https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv'" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   通し = col_double(),
##   ..   厚労省NO = col_character(),
##   ..   無症状病原体保有者 = col_character(),
##   ..   国内 = col_character(),
##   ..   チャーター便 = col_character(),
##   ..   年代 = col_character(),
##   ..   性別 = col_character(),
##   ..   確定日 = col_character(),
##   ..   発症日 = col_character(),
##   ..   受診都道府県 = col_character(),
##   ..   居住都道府県 = col_character(),
##   ..   居住管内 = col_character(),
##   ..   居住市区町村 = col_character(),
##   ..   キー = col_character(),
##   ..   発表 = col_character(),
##   ..   都道府県内症例番号 = col_character(),
##   ..   市町村内症例番号 = col_double(),
##   ..   ステータス = col_character(),
##   ..   備考 = col_character(),
##   ..   ソース = col_character(),
##   ..   ソース2 = col_character(),
##   ..   ソース3 = col_character(),
##   ..   人数 = col_double(),
##   ..   累計 = col_double(),
##   ..   前日比 = col_double(),
##   ..   発症数 = col_double(),
##   ..   死者合計 = col_double(),
##   ..   退院数累計 = col_double(),
##   ..   退院数 = col_double(),
##   ..   PCR検査実施人数 = col_double(),
##   ..   PCR検査前日比 = col_double(),
##   ..   職業_正誤確認用 = col_character(),
##   ..   勤務先_正誤確認用 = col_character(),
##   ..   `Hospital Pref` = col_character(),
##   ..   `Residential Pref` = col_character(),
##   ..   Release = col_character(),
##   ..   Gender = col_character(),
##   ..   X = col_double(),
##   ..   Y = col_double(),
##   ..   確定日YYYYMMDD = col_character(),
##   ..   受診都道府県コード = col_character(),
##   ..   居住都道府県コード = col_character(),
##   ..   更新日時 = col_character(),
##   ..   Field2 = col_character(),
##   ..   Field4 = col_logical(),
##   ..   Field5 = col_logical(),
##   ..   Field6 = col_logical(),
##   ..   Field7 = col_logical(),
##   ..   Field8 = col_logical(),
##   ..   Field9 = col_logical(),
##   ..   Field10 = col_logical()
##   .. )
jag

4.1.6 Toyo Keizai 東洋経済データ

すっきりしていて、見やすい。情報が十分であるとは、思わないが、いくつもの、観点から情報をみることができ、コミュニケーションの面からも、すぐれている。

library(jsonlite)
toyokeizai <- fromJSON('https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/data.json')
str(toyokeizai)
## List of 5
##  $ updated         :List of 4
##   ..$ last       :List of 2
##   .. ..$ ja: chr "最終更新:2020年4月24日"
##   .. ..$ en: chr "Last updated: 24 April 2020"
##   ..$ transition :List of 2
##   .. ..$ ja: chr "4月24日 12:00時点"
##   .. ..$ en: chr "As of 24 April 12:00"
##   ..$ prefectures:List of 2
##   .. ..$ ja: chr "4月24日 12:00時点"
##   .. ..$ en: chr "As of 24 April 12:00"
##   ..$ demography :List of 2
##   .. ..$ ja: chr "4月23日 18:00時点"
##   .. ..$ en: chr "As of 23 April 18:00"
##  $ transition      :List of 6
##   ..$ carriers  : chr [1:68, 1:6] "2020" "2020" "2020" "2020" ...
##   ..$ cases     : int [1:68, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ discharged: chr [1:68, 1:5] "2020" "2020" "2020" "2020" ...
##   ..$ serious   : int [1:68, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ deaths    : chr [1:68, 1:5] "2020" "2020" "2020" "2020" ...
##   ..$ pcrtested : int [1:52, 1:4] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ demography      : int [1:10, 1:3] 120 67 25 11 4 2 0 0 0 3 ...
##  $ prefectures-map :'data.frame':    47 obs. of  4 variables:
##   ..$ code : int [1:47] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ ja   : chr [1:47] "北海道" "青森県" "岩手県" "宮城県" ...
##   ..$ en   : chr [1:47] "Hokkaido" "Aomori" "Iwate" "Miyagi" ...
##   ..$ value: int [1:47] 540 22 0 84 16 65 65 153 53 138 ...
##  $ prefectures-data:List of 4
##   ..$ discharged: int [1:37, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ deaths    : int [1:37, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ carriers  : int [1:44, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##   ..$ pcrtested : int [1:44, 1:50] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
# head(toyokeizai$transition$cases)
cases <- as.data.frame(toyokeizai$transition$cases)
colnames(cases) <- c("Y", "M", "D", "Cumulative_Cases")
toyokeizai_cases <- cases %>% 
  mutate(Date = as.Date(paste(cases$Y, cases$M, cases$D, sep = "-"), "%Y-%m-%d")) %>%
  mutate(Daily_Cases = c(0, diff(Cumulative_Cases))) %>%
  select(Date, Cumulative_Cases, Daily_Cases)
toyokeizai_cases

4.2 Estimation of the Reproduction Number

一人の感染者が何人の感染者を生み出すかが、Reproduction Number である。最初は、それが、新たな感染者となるが、抗体が生じているひとが多くなると、新たな感染者とはならないため、Reproduction Number は減少する。最初のものを、\(R_0\) と書き、Basic Reproduction Number という。この数とともに、Generation Period として、感染させる可能性が何日維持されるかも重要である。Covid-19 では、感染しても、発症しない、または、軽症で治癒するひとがかなりの割合おり、そのひとたちも、感染源となり得るとされており(このひとたちの Reproduction Number は不明)、感染の広がりを見えなくしている。Basic Reproduction Number も、Generation Period も、実際には、確定が非常に困難である。以下では、検査で陽性だと判明するひとを捕捉することになるが、正確な値をえることは、原理的に不可能であるように思われる。
さらに、検査の実施状況、検査の正確さなども、影響し、なにが把握できて、なにが把握できないかを把握することも困難である。

4.2.1 earlyR

In the folowing we follow earlyR homepage. The earlyR package uses incidence package. See earlyR above.

Note: It is very important to make sure that the last days without cases are included here. Omitting this information would lead to an over-estimation of the reproduction number (R).

このようにあるので、現時点で EarlyR Package で Reproduction Number を、Estimate することは、できない。また、パラメターとして、使用する、mu と sigma も不明である。 例として掲載されている例ともあまりにもかけ離れている。

3.1 の Article では、mean = 5.0 days, standard deviation = 3.4 としているので、これを採用することにする。適用する地域が異なるときに、これでよいかは不明。

Data: data_jp

Japanese data as of March 7, 2020 of Covid-19.

library(earlyR)
library(incidence)
onset <- data_jp$確定日
today <- as.Date("2020-03-07")
i <- incidence(onset, last_date = today)
## 3 missing observations were removed.
i
## <incidence object>
## [296 cases from days 2020-01-15 to 2020-03-07]
## 
## $counts: matrix with 53 rows and 1 columns
## $n: 296 cases in total
## $dates: 53 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 53 days
## $cumulative: FALSE
plot(i, border = "white")

mu <- 5 # mean in days days
sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.141141
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-16" "2020-01-17" "2020-01-18" "2020-01-19" "2020-01-20"
## [6] "2020-01-21"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9409  1.1011  1.1411  1.1452  1.1912  1.3714
quantile(R_val, c(0.025, 0.975)) # 95% credibility interval
##     2.5%    97.5% 
## 1.021021 1.281281
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Johns Hopkins University Data

First we define a function to find the onset.

date <- as_date(c("2020-01-15", "2020-02-10", "2020-02-12", "2020-02-13"))
class(date)
## [1] "Date"
n <- c(1, 1, 3, 5)

datelist <- function(date, n){
  dat <- date[1]
  for(i in 2:length(date)) dat <- c(dat, rep(date[i],n[i]))
  dat
}

datelist(date, n)
##  [1] "2020-01-15" "2020-02-10" "2020-02-12" "2020-02-12" "2020-02-12"
##  [6] "2020-02-13" "2020-02-13" "2020-02-13" "2020-02-13" "2020-02-13"

Japan

#library(earlyR)
#library(incidence)
onset <- datelist(japan$Reported_Date, japan$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-20")
i <- incidence(onset, last_date = today)
## 11405 observations outside of [2020-01-22, 2020-03-20] were removed.
i
## <incidence object>
## [962 cases from days 2020-01-22 to 2020-03-20]
## 
## $counts: matrix with 59 rows and 1 columns
## $n: 962 cases in total
## $dates: 59 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 59 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.131131
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.455961 0.4201881...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.041   1.111   1.131   1.133   1.151   1.261
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)


Korea

#library(earlyR)
#library(incidence)
onset <- datelist(korea$Reported_Date, korea$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-02-21")
i <- incidence(onset, last_date = today)
## 10504 observations outside of [2020-01-22, 2020-02-21] were removed.
i
## <incidence object>
## [204 cases from days 2020-01-22 to 2020-02-21]
## 
## $counts: matrix with 31 rows and 1 columns
## $n: 204 cases in total
## $dates: 31 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 31 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.762763
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.3297552 0.2962323 0.4271719 0.5502179...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.122   2.653   2.793   2.785   2.913   3.433
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Germany

#library(earlyR)
#library(incidence)
onset <- datelist(germany$Reported_Date, germany$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-01")
i <- incidence(onset, last_date = today)
## 152999 observations outside of [2020-01-22, 2020-03-01] were removed.
i
## <incidence object>
## [131 cases from days 2020-01-22 to 2020-03-01]
## 
## $counts: matrix with 40 rows and 1 columns
## $n: 131 cases in total
## $dates: 40 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 40 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.242242
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.2539856...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.682   2.122   2.252   2.252   2.372   2.823
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Italy

#library(earlyR)
#library(incidence)
onset <- datelist(italy$Reported_Date, italy$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-02-25")
i <- incidence(onset, last_date = today)
## 189651 observations outside of [2020-01-22, 2020-02-25] were removed.
i
## <incidence object>
## [323 cases from days 2020-01-22 to 2020-02-25]
## 
## $counts: matrix with 35 rows and 1 columns
## $n: 323 cases in total
## $dates: 35 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 35 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.442442
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.042   2.352   2.442   2.450   2.543   2.903
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

Spain

#library(earlyR)
#library(incidence)
onset <- datelist(spain$Reported_Date, spain$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-01")
i <- incidence(onset, last_date = today)
## 212940 observations outside of [2020-01-22, 2020-03-01] were removed.
i
## <incidence object>
## [85 cases from days 2020-01-22 to 2020-03-01]
## 
## $counts: matrix with 40 rows and 1 columns
## $n: 85 cases in total
## $dates: 40 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 40 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.572573
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.1504831 0.123495 0.09741672 0.07471344...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.882   2.402   2.603   2.606   2.793   3.654
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

US

#library(earlyR)
#library(incidence)
onset <- datelist(us$Reported_Date, us$Daily_Confirmed_Cases)
# head(onset, 30)
today <- as.Date("2020-03-10")
i <- incidence(onset, last_date = today)
## 868211 observations outside of [2020-01-22, 2020-03-10] were removed.
i
## <incidence object>
## [959 cases from days 2020-01-22 to 2020-03-10]
## 
## $counts: matrix with 49 rows and 1 columns
## $n: 959 cases in total
## $dates: 49 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 49 days
## $cumulative: FALSE
plot(i, border = "white")

#mu <- 5 # mean in days days
#sigma <- 3.4 # standard deviation in days
res <- get_R(i, si_mean = mu, si_sd = sigma)
res
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 2.072072
## 
## 
##  // $lambda:
##   0.1792721 0.1727373 0.3297552 0.2962323 0.7857161 0.7164204...
## 
##  // $dates:
## [1] "2020-01-23" "2020-01-24" "2020-01-25" "2020-01-26" "2020-01-27"
## [6] "2020-01-28"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.16262975778547
##     scale: 2.312
plot(res)

R_val <- sample_R(res, 1000)
summary(R_val) # basic stats
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.872   2.032   2.072   2.075   2.122   2.282
hist(R_val, border = "grey", col = "navy",
     xlab = "Values of R",
     main = "Sample of likely R values")

plot(res, "lambdas", scale = length(onset) + 1)
abline(v = onset, lwd = 3, col = "grey")
abline(v = today, col = "blue", lty = 2, lwd = 2)
points(onset, seq_along(onset), pch = 20, cex = 3)

4.3 EpiEstim

3.1 Tim Churches のブログで紹介されている。(earlyR 応用例)しかし、知識も理解も十分ではないので、いずれ、もう少し学んでから再挑戦。時間をみつけて、取り組む。

We follow EpiEstim: vignette. See EpiEstim above.